Search Results/Filters    

Filters

Year

Banks



Expert Group









Full-Text


Issue Info: 
  • Year: 

    2021
  • Volume: 

    32
  • Issue: 

    1
  • Pages: 

    133-141
Measures: 
  • Citations: 

    0
  • Views: 

    25
  • Downloads: 

    12
Abstract: 

The Today’s digital world computations are tremendously difficult and always demands for essential requirements to significantly process and store enormous size of datasets for wide variety of applications. Since the volume of digital world data is enormous, this is mostly generated unstructured data with more velocity at beyond the limits and double day by day. In last decade, many organizations have been facing major problems to handling and process massive chunks of data, which could not be processed efficiently due to lack of enhancements on existing and conventional technologies. In this paper address, how to overcome these problems as efficiently by using the most recent and world primary powerful data processing tool, which is hadoop clean open source and one of the core component called Map Reduce, but which has few performance issues. This paper main goal is address and overcome the limitations and weaknesses of Map Reduce with Apache Spark.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 25

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 12 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2021
  • Volume: 

    7
Measures: 
  • Views: 

    95
  • Downloads: 

    0
Abstract: 

The amount of data is growing sharply on the Internet. Some data like log files are enormous and entail valuable and precious hidden patterns. In other words, a log file is a set of recorded events that carry beneficial and vital information to develop web server performance, stability server loads, control, and rush up user response operations. However, analyzing massive data take a long time and require powerful hardware. Also, the performance of sequential pattern mining methods is usually unsatisfactory to deal with such data. This paper proposes a novel and advanced parallel method for finding the log file patterns, such as frequent patterns (e. g., URL, IP, Status Code ), how users accessed files, the number of errors, and the most common errors by applying the Apache Spark platform. Experiment results demonstrate that the proposed method's run time on three datasets is significantly less than its four rival pattern mining methods.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 95

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
Issue Info: 
  • Year: 

    2017
  • Volume: 

    3
Measures: 
  • Views: 

    160
  • Downloads: 

    82
Abstract: 

RDF MODELS ARE WIDELY USED IN THE WEB OF DATA DUE TO THEIR FLEXIBILITY AND SIMILARITY TO GRAPH PATTERNS. BECAUSE OF GROWING USE OF RDFS, THEIR VOLUMES AND CONTENTS ARE INCREASING. THEREFORE, PROCESSING OF SUCH AMOUNT OF DATA ON A SINGLE MACHINE IS NOT EFFICIENT ENOUGH, BECAUSE OF THE RESPONSE TIME AND LIMITED HARDWARE RESOURCES. AS A RESULT, TO PROCESS THIS DATA MODEL, CLUSTER PROCESSING IS INTRODUCED. ONE OF THESE CLUSTER PROCESSING TOOLS IS APACHE HADOOP. BECAUSE OF USING TOO MUCH OF HARD DISKS, THE RESPONSE TIME IS USUALLY UNACCEPTABLE. IN THIS PAPER, ACCORDING TO THIS PROBLEM, WE USE APACHE SPARK FOR RAPID PROCESSING OF RDF DATA MODELS. ONE KEY FEATURE OF APACHE SPARK IS USING MAIN MEMORY INSTEAD OF HARD DISK, SO THE SPEED OF DATA PROCESSING IS IMPROVED. IN CONTINUES, WE WILL RUN SQL QUERY ON RDF DATA WHICH PARTITIONED ON THE CLUSTER.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 160

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 82
Issue Info: 
  • Year: 

    2022
  • Volume: 

    9
  • Issue: 

    1
  • Pages: 

    57-69
Measures: 
  • Citations: 

    0
  • Views: 

    20
  • Downloads: 

    2
Abstract: 

The increase in the use of the Internet and web services and the advent of the fifth generation of cellular network technology (5G) along with ever-growing Internet of Things (IoT) data traffic will grow global internet usage. To ensure the security of future networks, machine learning-based intrusion detection and prevention systems (IDPS) must be implemented to detect new attacks, and big data parallel processing tools can be used to handle a huge collection of training data in these systems. In this paper Apache Spark, a general-purpose and fast cluster computing platform is used for processing and training a large volume of network traffic feature data. In this work, the most important features of the CSE-CIC-IDS2018 dataset are used for constructing machine learning models and then the most popular machine learning approaches, namely Logistic Regression, Support Vector Machine (SVM), three different Decision Tree Classifiers, and Naive Bayes algorithm are used to train the model using up to eight number of worker nodes. Our Spark cluster contains seven machines acting as worker nodes and one machine is configured as both a master and a worker. We use the CSE-CIC-IDS2018 dataset to evaluate the overall performance of these algorithms on Botnet attacks and distributed hyperparameter tuning is used to find the best single decision tree parameters. We have achieved up to 100% accuracy using selected features by the learning method in our experiments.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 20

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 2 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2023
  • Volume: 

    9
Measures: 
  • Views: 

    41
  • Downloads: 

    28
Abstract: 

Using a mix of machine learning algorithms and big data tools, particularly Apache Spark and also Apache Kafka, this research provides a new method for real-time blood pressure prediction. The method can handle large amounts of inbound data from numerous sources, including wearable technology and internet of things monitors. A clustering-based approach is used to improve the blood pressure estimation's precision while the data is being analyzed in real-time. ECG, PPG, and ABP signals dataset are used to assess the suggested strategy, and the findings show a substantial improvement in blood pressure prediction accuracy when compared to previous methods. The suggested method has the potential to be used in numerous uses, such as remote patient tracking, individualized healthcare, and cardiovascular disease early detection. This research offers two contributions. First off, it introduces a novel technique for real-time blood pressure forecast that is more accurate than current approaches. In addition, it shows the value of merging machine learning techniques with real-time streaming data processing systems like Apache Spark and Apache Kafka. Further improving the scalability and accuracy of the system is the use of web-based tools and deep learning methods. The suggested method may have a big impact on how well patients do and how much it will cost to treat them. Overall, this research offers a path that can be useful to both individuals and healthcare professionals for the creation of real-time blood pressure forecast tools.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 41

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 28
Author(s): 

Riza Lala Septem | Nurfathiya Muhammad Ilham | Kusnendar Jajang | Abu Samah Khyrina Airin Fariza

Issue Info: 
  • Year: 

    2021
  • Volume: 

    12
  • Issue: 

    Special Issue
  • Pages: 

    1561-1572
Measures: 
  • Citations: 

    0
  • Views: 

    27
  • Downloads: 

    4
Abstract: 

The objective of this research is to design and implement a computational model to determine DNA barcodes by utilizing the Particle Swarm Optimization (PSO) algorithms implemented on Big Data Platforms, namely Apache Hadoop and Apache Spark. The steps are as follows: (i) inputting DNA sequences to Hadoop Distributed File System (HDFS) in Apache Hadoop, (ii) pre-processing data, (iii) implementing PSO by utilizing the User Defined Function (UDF) in Apache Spark, (iv) collecting results and saving to HDFS. After obtaining the computational model, two following simulations have been done: the first scenario is using 4 cores and several worker nodes, meanwhile, the second one consists of a cluster with 2 worker nodes and several cores. In terms of computational time, the results show a significant acceleration between standalone and big data platforms with both experimental scenarios. This study proves that the computational model built on the big data platform shows the development of features and acceleration of previous research.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 27

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 4 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    2
Measures: 
  • Views: 

    148
  • Downloads: 

    0
Abstract: 

TODAY, THERE ARE MANY DATA SOURCES DUE TO THE INCREASED USE OF SMART DEVICES, WHICH ARE RESPONSIBLE FOR CONNECTING, COLLECTING, EXCHANGING AND TRANSMITTING THIS DATA VOLUME. IN ADDITION, RESEARCH SHOWS THAT BY THE YEAR 2030 ABOUT A TRILLION SENSORS WILL CONNECT TO THE INTERNET OF THINGS, WHICH WILL COLLECT AND TRANSMIT A LARGE AMOUNT OF DATA. THEREFORE, THERE IS A NEED TO USE LARGE DATA APPLICATIONS IN THE IOT. THESE TECHNOLOGIES ARE INTERDEPENDENT AND MUST BE DEVELOPED TOGETHER. IN THIS PAPER, WE REVIEW SOME OF THE CHALLENGES AND ISSUES OF LARGE DATA, AS WELL AS THE RESEARCH DONE BY OTHER RESEARCHERS. IN THE FOLLOWING, WE EXAMINE THE TWO MAJOR FRAMEWORKS IN THE LARGE DATA AND COMPARE THEM, AND ULTIMATELY EXAMINE THE REQUIREMENTS, AS WELL AS REVIEW THE ANALYTICAL SOLUTIONS IN THIS AREA.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 148

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
Journal: 

نقشه برداری

Issue Info: 
  • Year: 

    1383
  • Volume: 

    -
  • Issue: 

    1
  • Pages: 

    5-12
Measures: 
  • Citations: 

    1
  • Views: 

    230
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 230

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    7
  • Issue: 

    2
  • Pages: 

    239-247
Measures: 
  • Citations: 

    0
  • Views: 

    197
  • Downloads: 

    122
Abstract: 

As fraudsters understand the time windows and act fast, real-time fraud management systems becomes necessary in the Telecommunication Industry. In this work, by analyzing the traces collected from a nationwide cellular network over a period of a month, an online behavior-based anomaly detection system is provided. Over time, users' interactions with the network provide a vast amount of data usage. This data usage is modeled to profiles by which the users can be identified. A statistical model is proposed, which allocates a risk number to each upcoming record, which reveals deviation from the normal behavior stored in profiles. Based on the amount of this deviation, a decision is made to flag the record as normal or abnormal. If the activity is normal, the associated profile is updated; otherwise, the record is flagged as abnormal, and it will be considered for further investigations. For handling the big dataset and implementing the methodology, we used the Apache Spark engine, which is an open source, fast, and general-purpose cluster computing system for big data handling and analysis. The experimental results show that the proposed approach can perfectly detect deviations from the normal behavior, and can be exploited for detecting anomaly patterns.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 197

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 122 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2021
  • Volume: 

    7
Measures: 
  • Views: 

    80
  • Downloads: 

    0
Abstract: 

The amount of data generated today regarding volume, generation velocity, and variety is quite immense. This, in turn, has created a great challenge for scientists and researchers. To devise a solution, researchers have suggested a variety of schemes to help alleviate this problem. One of the suggested schemas is Association Rule Mining, and it is primarily focused on finding the associations in transaction-like data. To assist in finding such associations, Frequent Itemsets should be discovered first. Therefore, this research is a new approach to finding Frequent Itemsets and it is based on the Apriori algorithm and Apache Spark distributed platform. Further, we introduce an extended version of Apriori which tends to find Maximal Frequent Itemsets first to help speed up the mining process. The results and comparison to algorithms like YAFIM and HFIM and the original Apriori show the suggested algorithm outperforms them in dense datasets by an average of 38 percent.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 80

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
litScript
email sharing button
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
sharethis sharing button